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Abstract 

LZ77-based compression schemes compress the input text by re- 
placing factors in the text with an encoded reference to a previous 
occurrence formed by the couple (length, offset). For a given factor, 
the smallest is the offset, the smallest is the resulting compression 
ratio. This is optimally achieved by using the rightmost occurrence 
of a factor in the previous text. Given a cost function, for instance 
the minimum number of bits used to represent an integer, we define 
the Rightmost Equal-Cost Position (REP) problem as the problem of 
finding one of the occurrences of a factor which cost is equal to the 
cost of the rightmost one. We present the Multi-Layer Suffix Tree 
data structure that, for a text of length n, at any time i, it provides 
REP(LPF) in constant time, where LPF is the longest previous factor, 
i.e. the greedy phrase, a reference to the list of REP({set of prefixes 
of LPF}) in constant time and REP(p) in time 0(\p\ log logn) for any 
given pattern p. 



Introduction 

The Rightmost Equal-Cost Position (REP) is an occurrence, pointed out by 
its offset, w.r.t. the current position, which reference uses a number of bits 
that is equal to the number of bits of the offset of the rightmost occurrence 
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of a factor (substring) in the previous text. This problem mainly comes from 
the data compression field and particularly from the LZ77 based schemes, 
where the length and offset pair are used to encode a dictionary phrase. 

The foundational Ziv and Lempel LZ77 algorithm [22] is the basis of 
almost all the famous dictionary compressors, like gZip, PkZip, WinZip and 
7Zip. They consider a portion of the previous text as the dictionary, i.e. they 
use a dynamic dictionary formed by the set of all the factors of the text up 
to the current position within a sliding window of fixed size. A dictionary 
phrase refers to an occurrence of such phrase in the text by using the couple 
(length, offset), where the offset is the backward offset w.r.t. the current 
position. Since a phrase is usually repeated more than once along the text 
and since pointers with smaller offset have usually a smaller representation, 
the occurrence close to the current position is preferred. 

Usually the length and offset pair is encoded by using a variable length 
code such as Huffman code or Elias Delta code (see for instance Deflate, 
gZip, PkZip, LZRW4, LZMA and 7zip descriptions in [16]). Assuming that 
the lengths of encoded values is a monotonic non decreasing function, it is 
straightforward that the smaller is the value, the smaller is its encoded cost, 
i.e. the length in bits of the encoded value. We define a Rightmost Equal- 
Cost Position (REP) of w as the one of the occurrences of the factor w which 
cost is the same as the rightmost one, according to a fixed cost function. The 
REP values can be optimally used to represent a dictionary phrase instead of 
the rightmost occurrence. The cost function can be, for instance, the number 
of bits of the binary representation, i.e. the smaller integer greater than the 
log 2 (a;), where x is the value we have to encode. Later on in this paper we 
show how an example using the bit length of the Elias Gamma code as cost 
function. Furthermore, in LZ77 based compression, the greedy approach is 
used to parse the text into phrases, i.e, in an iterative way, the longest match 
between the dictionary and the forwarding text is chosen. This is commonly 
called the greedy phrase. Some LZ77-based algorithms as Deflate algorithm 
and the compressors based on them, like gZip and PkZip, use variants of 
the greedy approach to parse the text. Deflate64 algorithm implemented in 
WinZip and 7zip, contains some heuristics to parse differently the text in 
order to improve the compression ratio, but its time complexity was never 
clearly stated. 

The research about parsing optimality for dictionary compression pro- 
duced in the last decades some noticeable results (see for instance [2, 8, 10, 
12, 13, 20]). The greedy parsing is optimal for LZ77 dynamic dictionary 
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schemes in the case of fixed length encoding of length and offset pair values 
as showed in [2, 20], but it is not optimal, in terms of compression perfor- 
mance, when variable length code (VLC) are used to encode the pointers 
to dictionary phrases. Recently, some optimal parsing solutions have been 
presented for dynamic dictionary and VLC, see [3, 4, 5]. In [4] presented a 
solution for LZ77 dictionary and VLC, called Dictionary-Symbolwise Flexi- 
ble Parsing. It uses a graph-based model for the parsing problem where each 
node represent a position in the text and edges represent dictionary phrases. 
Edges are weighted according to the bit length of the encoded length and off- 
set pair. For details about this algorithm we refer to the original paper [4]. 
Unfortunately, this latter paper lacks of a practical solution for the problem 
of finding the smallest offset (the rightmost position) for a dictionary phrase. 
Even more, optimal parsing algorithms for the VLC case need to know, for 
any position in the text, the length and offset pairs of all the prefixes of the 
Longest Previous Factor (LPF) or, at least, those prefixes which occurrence 
have a different offset. 

The main goal of this work is to state the problem about the Rightmost 
Equal-cost Position (REP) and to present the Multilayer Suffix Tree as an 
efficient solution. In Section 1 we show how to reduce the problem of finding 
the rightmost position of a word w to the weaker REP (it;) problem. In Sec- 
tion 2 we present the new Multilayer Suffix Tree full-text index data structure 
that uses 0{n\ogn) amortized building time and linear space. In Section 3 
we show how to solve the REP problem for any pattern p in 0(\p\ log log n) 
time. Finally, in Section 4 we show how to use this new data structure, at 
any time i, to retrieve REP(LPF) and a pointer to the list of REP(SPF) in 
constant time, where LPF is the longest previous factor starting at position 
% and SPF is the set of prefixes of LPF, and, furthermore, we show some 
experimental results. 

1 Definition of the Problem 

Let Pos(w) C N the set of all the occurrences of w G Fact(T) in the text 
T G £*, where Fact{T) is the set of the factors of T. Let Offset(w) C N 
be the set of all the occurrence offsets of w G Fact(T) in the text T, i.e. 
x G Offset(w) iff x is the distance between the position of an occurrence of 
w and the end of the text T. For instance, given the text T = babcabbababb 
of length |T| = 12 and the factor w = abb of length \w\ = 3, the set of 
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positions of w over T is Pos(w) = {4, 9}. The set of the offsets of w over T 
is Offset(w) = {7,2}. Notice that x G Offset(w) iff exists y G Pos(w) such 
that x = \T\ — y — 1. Since the offsets are function of positions, there is a 
bijection between Pos(w) and Offset(w), for any factor w. 

Given a number encoding method, let Bitlen : N — > N a function that 
associates to a number x the length in bit of the encoding of x. Let us 
consider the equivalence relation having equal codeword bit-length on the 
set Offset(w). The numbers x,y G Offset(w) are bit-length equivalent i/f 
Bitlen(x) = Bitlen(y). Let us notice that the having equal codeword bit- 
length relation induces a partition on Offset{w). 

Definition 1.1. Given a word w G £*, a text T = a\...a n G £*, the 
rightmost occurrence of w over T is the occurrence of w that appears closest 
to the end of the text, if w appears at least once in the text T, otherwise it 
is not defined. More formally, 



riahtmovKw) - I mhl{x 1 X G ff se < w )} if Offset(w) ^ 
rigntmost{w) - j ^ defined . f Qffset ^ = 

Let us notice that referring to the rightmost occurrence of a word in a 
dynamic setting, where the input text is processed left to right, corresponds to 
referring to the rightmost occurrence over the text already processed. Indeed, 
if at a certain algorithm step we have processed the first % symbols of the text, 
the rightmost occurrence of w is the occurrence of w closest to the position 
% of the text. 

Definition 1.2. Let rightmost ^w) be the rightmost occurrence of w over Tj, 
where Tj is the prefix of the text T ending at the position % in T. Obviously, 
rightmost n (w) = rightmost{w) for \T\ = n. 

In many practical data compression algorithms the text we are able to 
refer to is just a portion of the whole text. Let T[j : i\ be the factor of 
the text T starting from the position j and ending to the position i. We 
generalize the definition of rightmost (w) over a factor T[j : i] of T as follows. 

Definition 1.3. Let rightmost j^w) be the rightmost occurrence of w over 
T[j : i], where T[j : i\ is the factor of the text T starting at the position j 
and ending at the position % of length + Obviously, rightmost ln (w) = 
rightmost{w) for \T\ = n. 
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Definition 1.4. The Rightmost Equal-Cost Position (REP). Let us 

suppose that we have a text T G £*, a pattern idgS* and a point z in time. 
If w appears at least once in TJ, then a rightmost equal-cost position of 
REP(w), over Tj is a position j in the text which offset is in [rightmost ^w)], 
where [rightmost ^w)] is the equivalence class induced by the relation having 
equal codeword bit-length containing the element rightmost ^w). Otherwise, 
when w does not appear in T h the REP(w) = 0. 

2 The Multilayer Suffix Tree 

The main idea of this new data structure, is based on two observations. 
The first one is that the equivalence relation having equal codeword bit-length 
induces a partition on Offset(w), for any w, and also induces a partition 
on the set of all the possible offsets over a text T independently from any 
specific factor, i.e. on the set [l..|T|]. The second observation is that for 
any encoding method for which the Bitlen function is a monotonic function, 
each equivalence class in [l..|T|] is composed by contiguous points in [l..|T|]. 
Indeed, given a point p G [l..|T|], the equivalence class [p] is equal to the set 
[j..i], with j < p < i, j = min{x G [p]} and i = max{x G [p]}. 

Assuming the Bitlen function as the cost function of the dictionary phrases, 
we exploit the discreteness of Bitlen in order to solve the REP problem by 
using a set of suffix trees for sliding window, each one devoted to one or more 
classes of equivalence of the relation having equal code bit-length. 

Fixed an encoding method for numbers and a text T, we assume that 
Bitlen is a monotonic function and that we know the set B — {bi, b 2 , • • • , b s }, 
with bi < b 2 < ■ ■ ■ < b s , that is the set of the Bitlen values of all the 
possible offsets over T. We define the set SW = {swi, s«i 2 , • • • , sw s }, with 
sw 1 < sw 2 < • • • < sw s , where swi is the greatest integer (smaller than or 
equal to the length of the text T) such that Bitlen(j) is less that or equal 
to hi. More formally, swi = maxjj < |T| | Bitlen(j) < bi}. Notice that 
sw s = \T\. 

All the standard non-unary representation of numbers satisfy the follow- 
ing property. 

Property 2.1. There exists a constant k > 1 and an integer k such that for 
any k < i < s one has swi > k sw^i. 
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The Huffman code do not strictly satisfy above property, but it easy to 
find a function f(w) always greater than Bitlen (Huffman code of w) that is a 
good approximation of it and it does have the Property 2.1 or, alternatively, 
you can impose that property rearranging the Huffman trees. 

Let us consider an example where integers are encoded by using the Elias 
7 codes. The Table 1 reports the Elias 7 codes and the Bitlen values for 
integers from 1 to 18. Suppose, for instance, that our cost function is asso- 
ciated to this Elias codes and that we have a text T of length 18. The set 
B therefore is B = {b\ — 1, 62 — 3, 63 = 5, 64 = 7, 65 = 9} and we have that 
swi = 1, sw 2 = 3, sw 3 = 7, SW4 = 15 and sw 5 = 18. Notice that, indeed, 
Property 2.1 is satisfied for k = 2 and k = 2. 

Let us now introduce the Multilayer Suffix Tree data structure. We sup- 
pose that a text T is provided online and at time i the first % characters of T 
have been read, i.e. at time i the prefix Tj of length i of the text T has been 



i 


7(i) 


Bitlen('~f(i)) 


1 


1 


1 


2 


010 


3 


3 


Oil 


3 


4 


00100 


5 


5 


00101 


5 


6 


00110 


5 


7 


00111 


5 


8 


0001000 


7 


9 


0001001 


7 


10 


0001010 


7 


11 


0001011 


7 


12 


0001100 


7 


13 


0001101 


7 


14 


0001110 


7 


15 


0001111 


7 


16 


000010000 


9 


17 


000010001 


9 


18 


000010010 


9 



Table 1: Elias 7 code for integers from 1 to 18 and their Bitlen value. 
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read. 



Definition 2.1. The Multilayer Suffix Tree is a data structure composed by 
the set S = {S SWl , S SW2 , . . . , S SWa } of suffix trees where, for any a G SW and 
at any moment i, S a is the suffix tree for sliding window of Tj with sliding 
window of size a such that S a represents all the factors of T[i — a : i\. We 
call S a simply the layer a or the layer of size a. 

From now on we will refer to suffix trees or layers indifferently. 

We use the online suffix tree for sliding window construction algorithm 
introduced by Larson in [11] and later refined by Senft in [17, 18], in order 
to build each layer of our multilayer suffix tree. Let us recall that in [11] an 
online and linear time construction for the suffix tree is reported. The suffix 
tree uses linear space w.r.t. the size of the sliding window. Therefore, for 
any S a G S = {S SWl , S SW2 , . . . , S SWg }, S a is a full-text index for T[i — a : i], 
where T is a given text and i is a point in time. S a uses 0(a) space. 

In order to decrease the practical space requirement, we think that it is 
possible to adapt our data structure to work with other classic indexes for 
sliding window (see for instance [9, 14, 19]) or to use the recently presented 
compressed versions of the suffix tree (e.g. [7, 15]). 

Proposition 2.2. 1. If a pattern w is in layer a with a G SW, then w is 
also in layer j3 for any (3 G SW with a < (3. 2. If a pattern w is not in a 
layer a, a G SW , then w is not in layer (3 with (3 < a 

Proof. The proof of the property at point 1 comes immediately from suffix 
trees properties. Since layer a is a full text index for T[i — a : i] and layer 
beta is a full-text index for T[i — f3 : i], for any a, f3 G SW with a < j3 and 
for any i, T[i — a : i] is a suffix of T[i — (3 : i]. The property at point 2 can 
be deduced by point 1. □ 

Proposition 2.3. Fixed a text T of size \T\ = n, at any moment i with 
< i < n and for any standard variable-length code, the multilayer suffix 
tree uses 0{i) space. 

Proof. Since at time i the maximum offset of all the occurrences in the text 
Tj is 0(i), for any standard variable-length code the maximum value of the 
set SW is 0(i). Since Property 2.1 holds for the set SW and since the 
multilayer suffix tree space is equal to the sum of the space of its layers, as 
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an immediate consequence we have that space used by the multilayer suffix 
tree is 0( £ a) = 0(i). □ 

aeSW 



For instance, at time i, if we consider the usual binary representation of 
numbers, the values a G SW turn out to be powers of 2 from 1 to j, where 
j is the greatest power of 2 smaller than i, plus i, i.e. SW = {1, 2, 4, . . . j, i} 
and Yl a < 2j + i < 3i and 0( Yl a ) = 

aeSW aeSW 

From the linear time of the online construction of the suffix tree for 
sliding window and since the number of layers is \SW\ = 0(log \ T\), we can 
immediately state the following proposition. 

Proposition 2.4. Given a text T of length \T\ = n, for any standard 
variable-length code, it is possible to build online the multilayer suffix tree 
in 0(n log n) amortized time. 



3 Solving REP for a Generic Pattern 

We want now to show how to answer to the REP(p) for a given pattern p of 
length \p\ and a text Tj of length i in 0(\p\ log log i). 

Proposition 3.1. If a pattern p is in layer (3 and is not in layer a, where 
a is the maximum of the values in SW smaller than (3, then any occurrence 
of p in the layer (3 correctly solve the REP problem for the pattern p. 

Proof. If a pattern p is in layer (3 and is not in layer a, where a is the 
maximum of the values in SW smaller than (3, then from Prop. 2.2, it follows 
that (3 is the smallest layer where it appears p. Therefore p has at least one 
occurrence in T[i — j3 : i — a — 1], i.e. rightmost ^p) G (a. .(3}. Since (a.. j3] is 
the equivalence class [f3] of the having equal codeword bit-length relation, we 
have that any occurrence of p in the layer (3 correctly solve the REP problem 
for p. □ 

Remark. Let us notice that if S SWx is the smallest layer where rightmost ^w) 
appears, then the Bitlen value of the offset of rightmost ^w) is equal to b x . 

Using above proposition, we are able to solve the problem REP(p) once 
we find the smallest layer containing the rightmost occurrence of the pattern, 
if any, otherwise we just report 0. 
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Layer 1 



Figure 1: The Multilayer Suffix Tree for the text T = ababaa and for 
the Elias 7-code, where B = {1,3,5}, SW = {1,3,6}. The solid edges 
are regular ones and the dashed links are the suffix-links of internal nodes. 
For convenience, we added edge labels with the substring of T associated to 
edges. The node value is the position over the text of the incoming edge. 
We omitted the string depth of nodes. Let consider, for instance, the phrase 
w = l ba" with Pos(w) = {1,3}, Offset{w) = {4,2}, rightmost{w) = 2 and 
7(2) =010. Since w is in layer SW2 = 3 and is not in layer sw\ = 1, we have 
that Bitlen(rightmost(w)) = 3 is equal to Bitlen(sw2 = 3) = b 2 = 3. 

What follows is the trivial search of the smallest layer that contains an 
occurrence of a given pattern. 

Given a pattern p at time i, we look for p in the layer swi, i.e. the smallest 
layer. If p is in S SWl , then all the occurrences of p in T[i — sw\ : i] belong to 
the class of the rightmost occurrence of p over T. If p is not in S SW1 , then we 
look for any occurrence of p is in S SW2 , the next layer in increasing order. If p 
is in S SW2 , since it is not in S SW1 , for the Prop. 3.1, any occurrence of p in this 
layer belong to the rightmost occurrence of p over Continuing in this way, 
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as soon as we found an occurrence of p in a layer, this occurrence correctly 
answer to REP(p). By using this trivial approach we can find REP(p) in 
time proportional to the length of the pattern times the number of layers. 

Since many of the classic variable-length codes for integers, like the Elias's 
7-codes, produce codewords of length proportional to the logarithm of the 
represented value, we can assume that the cardinality of SW is 0(log|T|). 
Since that |Tj| = i, in the online fashion, we have that the time for REP(p) 
is 0(\p\ logi). 

A similar result can be obtained by using a variant of the Amir et al. 
algorithm presented [1], but it does not support REP(LPF) operations in 
constant time and it does not support sliding window over the text. 

Since Prop. 2.2 holds for the layers of our data structure, we can use the 
binary search to find the smallest layer containing a given pattern. Since 
\SW\ = O(logz), for any classic variable-length code, the number of layers 
in our structure is 0(\ogi) and the proof of following proposition comes 
straight forward . 

Proposition 3.2. Using any classic variable-length code, at any time i the 
multilayer suffix tree is able to answer to REP(pattern) for a given pattern 
in 0(| pattern | log log i) time. 

4 Solving REP(LPF) and REP(SPF) 

As pointed out in the Senft's paper [18], the online suffix tree construction 
algorithms in [6, 11, 21]) keep track of the longest repeated suffix of the text. 
We use those information, one from any layer of the multilayer suffix tree, 
to answer in constant time to the REP(LPF) problem and the REP(SPF), 
where LPF is the longest previous factor and SPF is the set of prefixes of 
LPF. 

Recall that, using Ukkonen's nomenclature, an Implicit Suffix Tree I k , 
1 < k < n, for the text T = a\ . . . a n is a Suffix Tree for the text T[l..fc] = 
Tfc. It is built starting from I k -\ by implicitly extending all the suffixes of 
Tk-\ that are unique, i.e. those that are leaves, and explicitly adding the 
suffixes of T k in decreasing order down to the longest suffix of T k that is 
already in Ik-i, i.e. the longest suffix of T k that appears at least twice in 
T k . In order to efficiently perform the explicit extensions, the construction 
algorithm maintains at each step a reference to the longest repeated suffix of 
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the text. Let us say that at the end of the step k the longest repeated factor 
is T[i ...&]. Eventually in some successive steps the longest repeated factor 
became T[i + 1 ... k'\. 

In order to solve the REP(LPF) problem in constant time, we let inde- 
pendently to grow each layer of the multilayer suffix tree to I(k'-i), where, for 
any layer, k' — 1 is the last step of the construction algorithm which longest 
repeated factor starts in position i. This does not change the time complex- 
ity of the algorithm, since the overall steps for a text T are the same. In 
the mean time, in order to correctly represent the factors of offset b a in each 
layer a, the sliding widow size are dynamically set to SW a + LPF a (i). 

Now that any layer point out to the longest repeated suffix starting in 
position i accordingly to the layer size, we just take in constant time, along 
the building process, from any layer one occurrence of its longest repeated 
suffix. Then, we keep the value coming from the smallest layer with a pointed 
factor of length equal to the length of the biggest layer, i.e. LPF(z). 

Proposition 4.1. Given a text T of length n in the dynamic setting, at 
time i, the multilayer suffix tree of the text T[l..i+LPF[i}] is able to solve 
REP(LPF) in constant time. 

Since LPF(z') is equal to the longest repeated suffix of T[l.i+LPF[i]] by 
the very definition of LPF, the proof of above proposition come straightfor- 
ward from Proposition 2.2. 

Concerning the occurrences of the set of prefixes of LPF (SPF for short), 
we are interested to those having a different bit length of their rightmost 
position, as the missing ones are just prefix of an occurrence of a longer one. 
With arguments similar to the ones used for the LPF case, we maintain inside 
the multilayer suffix tree a list SPF of (length, offset) pairs in the following 
way. At any time i, let occ be the position of LPF(i) found in the largest 
layer. The pair (LPF [z], occ) is added to the empty list SPF. Let LPF Q and 
occ a be the length and the offset of the longest repeated suffix pointed out by 
the layer a. For any a in decreasing order if the last pair in the list SPF has 
length value equal to LPF Q , then this element is updated with the offset value 
occ a . Otherwise the pair (LPF a , occ a ) is appended to the list. Notice that 
the size of the list is smaller that or equal to the number of layers O(logi) 
as already assumed. In order to solve the REP (SPF) problem it suffice to 
retrieve a pointer to the internal SPF list of the multilayer suffix tree. 

The proof of the following proposition is straightforward. 
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Proposition 4.2. Given a text T of length n in the dynamic setting, at 
time i, the multilayer suffix tree of the text T[l..i+LPF[i]} is able to solve 
REP(SPF) in constant time. 

At any time i, since any REP(SPF) value has equal-cost w.r.t. the right- 
most occurrence of the corresponding LPF prefix, REP(SPF) can be used 
as dictionary pointers for the dictionary phrases matching the factors of the 
text starting at position i in any LZ77 based compression algorithm. Due to 
space constrain we omit the details of the next proposition. 

Proposition 4.3. Given a text T, using at any time i the REP(SPF) values 
provided by the multilayer suffix tree, one can correctly build the parsing graph 
G' A T of the Dictionary-Symbolwise Flexible Parsing algorithm. 

We present some experimental results in order to evaluate the practical 
requirements in terms of space and time. We used as cost function just the 
length of the binary representation of numbers and then sliding windows of 
size equal to the increasing powers of 2 up to a fixed length MaxDictionary- 
Size equal to 2 24 . The space grows up to 35 Bytes times MaxDictionarySize 
for regular text file like the bible and the enwik8 ones (see Table 2 caption 
for more details), as our implementation uses 17 Byte in average per char- 
acter for any layer. We compared the multilayer suffix tree (MLST) running 



Input 


Length 


MLST 


RMST 


A 


bible 


4 MB 


8.52 fis 


1.77 fis 


4.81 


enwik8 


100 MB 


16.15 fj,s 


3.00 fis 


5.38 


dna 


104 MB 


8.51 fis 


2.19 fis 


3.88 



Table 2: Execution time table for different input. The bible file belongs to 
the Large Canterbury Corpus, the enwk8 file contains the first 100 MB of 
the English Wikipedia database and the dna file belongs to the Repetitive 
Pizza&Chili Corpus. The MLST column report the total time per character 
of the construction of the Multilayer Suffix Tree with max window size of 4 
MB. The RMST column report the building time per character of one single 
suffix tree with sliding window of 4 MB. The A column contains the ratio 
between MLST and RMST times. 
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time with the running time of a suffix tree with sliding window of size Max- 
Dictionary Size maintaining the rightmost occurrence of any internal node 
(RMST). At any time i, the position value of internal nodes in the path from 
the root to the point of insertion of the new leaf are updated. The average 
running time of the MLST is about 5 times greater than the RMST time, 
as shown in Table 2. Furthermore, since all the layers in the multilayer suf- 
fix tree are independent each other, it is easy to speed up the overall time 
by using a parallel handling of the layers. Moreover, since maintaining the 
rightmost position in RMST runs in 0(n log n) average case and 0(n 2 ) in 
the worst case while the MLST run time is O(nlogn) in the worst case, the 
use of the MLST is encouraged when the dictionary size become large and/or 
the text contains highly repetitive factors, e.g. using molecular biology data, 
as the A value for the dna file suggests. 
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